Notes on Learning Probabilistic Automata
نویسنده
چکیده
Alberto Apostolico y Probabilistic models of various classes of sources are developed in the context of coding and compression as well as in machine learning and classi cation. In the rst domain, the repetitive structures of substrings are regarded as redundancies and sought to be removed. In the second, repeated subpatterns are unveiled as carriers of information and structure. In both contexts, one rather pervasive problem is that of learning or estimating probabilities from the observed strings. For most probabilistic models, such a task poses interesting algorithmic questions (cf., e.g., the references). A popular approach to the statistical modeling of sequences relies on the structure of uniform, xed-memory Markov models. For sequences in important families, the autocorrelation or \memory" exhibited decays exponentially fast with length. In other words, there is a maximum length L of the recent history of a sequence, above which the empirical probability distribution of next symbol given the the last L > L symbols does not change appreciably. It is possible and customary to model these sources by Markov chains of order L, this maximum useful memory length. Even so, such automata tend to be in practice unnecessarily bulky and computationally imposing both during their synthesis and use. In [6], much more compact, tree-shaped variants of probabilistic automata are built which assume an underlying Markov process of variable memory length not exceeding some maximum L. The probability distributions generated by these automata is equivalent to that of a Markov chain of order L, but the description of the automaton itself is much more succinct. The process of learning the automaton from a given training set S of sequences requires (Ln) worst-case time, where n is the total length of the sequences in S and L is the length of a longest substring of S to be considered for a candidate state in the automaton. Once the automaton is built, predicting the likelihood of a query sequence of m characters may cost time (m) in the worst case. This work introduces automata equivalent to PSTs that can be learned in O(n) time, and also discusses notions of empirical probability and their e cient computation. Details of the learning procedure and of a linear time classi er or parser may be found in [2, 3].
منابع مشابه
A Link Prediction Method Based on Learning Automata in Social Networks
Nowadays, online social networks are considered as one of the most important emerging phenomena of human societies. In these networks, prediction of link by relying on the knowledge existing of the interaction between network actors provides an estimation of the probability of creation of a new relationship in future. A wide range of applications can be found for link prediction such as electro...
متن کاملModularity in Coalgebra
This paper gives an overview of recent results concerning the modular derivation of (i) modal specification logics, (ii) notions of simulation together with logical characterisations, and (iii) sound and complete axiomatisations, for systems modelled as coalgebras of functors on Set. Our approach applies directly to an inductivelydefined class of coalgebraic types, which subsumes several types ...
متن کاملProbabilistic pi-Calculus and Event Structures
This paper proposes two semantics of a probabilistic variant of the π-calculus: an interleaving semantics in terms of Segala automata and a true concurrent semantics, in terms of probabilistic event structures. The key technical point is a use of types to identify a good class of non-deterministic probabilistic behaviours which can preserve a compositionality of the parallel operator in the eve...
متن کاملSEN - 1004 ) 10 th International Workshop on Coalgebraic Methods in Computer Science
The powerset construction is a standard method for converting a non-deterministic automaton into an equivalent deterministic one as far as language is concerned. In this short paper, we lift the powerset construction on automata to the more general framework of coalgebras with enriched state spaces. Examples of applications includes transducers, and Rabin probabilistic automata.
متن کاملOn the automatic Entropy-based construction of Probabilistic Automata in a Learning Robotic Scenario
When a robot interacts with the environment producing changes through its own actions, it should find opportunities for learning and updating its own models of the environment. A robot that is able to construct discrete models of the underlying dynamical system which emerges from this interaction can guide its own behavior and adapt it based on feedback from the environment. Thus, the induction...
متن کامل